My Tran, Alex Park, Esmay Muniz
2022-11-30
Link here for the murder data set
This dataset was Homicide Reports information. Through our analyzes we will find approaches to our findings. Our data will give a visualization and identify correlations within these murders throughout the year of 1980-2014.
Domain Question What factors are related to motives and behaviors of the killers?
Other questions
what weapon was used the most?
what state had the most kills?
do the victims know their perpetrator?
How does gender play a role in homicide incidents?
library(tibble) # used to create tibbles
library(tidyr) # used to tidy up data
library(rmarkdown) # dynamic document
library(ggplot2) # used for data visualization
library(dplyr) # used for data manipulation
library(shiny) # used for showing dynamic visuals in collaboration with ggvis
library(prettydoc)# used for creating pretty documents from R markdown
library(knitr)#for dynamic report generation
library(tidyverse)# multiple tidy up data packages here
library(hms) # used to install kableExtra package
library(kableExtra) # used to construct Complex Table for data
library(dplyr) # used to install tigris package
library(tigris) # used to make states map
#added library for other graphs
library(plotly)
library(rjson)
library(leaflet)
library(leaflet.providers)
library(maps)
library(viridis)
library(viridisLite)
library(sp)
library(quantmod)
library(plot3D)
library(sf)
library(RColorBrewer)
library(gganimate)Original dataset we have is from Kaggle, “Homicide Report”. Firstly, we take a look at data.
## Rows: 638,454
## Columns: 24
## $ Record.ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
## $ Agency.Code <chr> "AK00101", "AK00101", "AK00101", "AK00101", "AK0…
## $ Agency.Name <chr> "Anchorage", "Anchorage", "Anchorage", "Anchorag…
## $ Agency.Type <chr> "Municipal Police", "Municipal Police", "Municip…
## $ City <chr> "Anchorage", "Anchorage", "Anchorage", "Anchorag…
## $ State <chr> "Alaska", "Alaska", "Alaska", "Alaska", "Alaska"…
## $ Year <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, …
## $ Month <chr> "January", "March", "March", "April", "April", "…
## $ Incident <int> 1, 1, 2, 1, 2, 1, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, …
## $ Crime.Type <chr> "Murder or Manslaughter", "Murder or Manslaughte…
## $ Crime.Solved <chr> "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "…
## $ Victim.Sex <chr> "Male", "Male", "Female", "Male", "Female", "Mal…
## $ Victim.Age <int> 14, 43, 30, 43, 30, 30, 42, 99, 32, 38, 36, 20, …
## $ Victim.Race <chr> "Native American/Alaska Native", "White", "Nativ…
## $ Victim.Ethnicity <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Unk…
## $ Perpetrator.Sex <chr> "Male", "Male", "Unknown", "Male", "Unknown", "M…
## $ Perpetrator.Age <int> 15, 42, 0, 42, 0, 36, 27, 35, 0, 40, 0, 49, 39, …
## $ Perpetrator.Race <chr> "Native American/Alaska Native", "White", "Unkno…
## $ Perpetrator.Ethnicity <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Unk…
## $ Relationship <chr> "Acquaintance", "Acquaintance", "Unknown", "Acqu…
## $ Weapon <chr> "Blunt Object", "Strangulation", "Unknown", "Str…
## $ Victim.Count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Perpetrator.Count <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, …
## $ Record.Source <chr> "FBI", "FBI", "FBI", "FBI", "FBI", "FBI", "FBI",…
Data Key Terms:
Agency Type: Law enforcement Agency who handled the case
State/City: State and Counties of the reported homicides
Year/Month: Time stamp of the homicides
Crime Type: Murder, Manslaughter or Negligence designated to case
Crime Solved: Whether the case has been solved or not
Victim Sex/Age/Race: Victim profile
Perpetrator Sex/Age/Race: Perpetrator profile
Relationship: The perpetrators relation to the victim
Weapon: Weapon used to commit homicide
Case Open/Closed: Change the designation of a crime being solved.
Solve Rate: Percentage of Homicide Reports where the case was closed
Top murder cases by state, we taking the raw dataset to display the top murder cases by State.
data %>% group_by(State) %>%
summarize(Murder_Count = n()) %>%
arrange(desc(Murder_Count)) %>%
kbl() %>% kable_paper() %>% scroll_box(height = "800px")| State | Murder_Count |
|---|---|
| California | 99783 |
| Texas | 62095 |
| New York | 49268 |
| Florida | 37164 |
| Michigan | 28448 |
| Illinois | 25871 |
| Pennsylvania | 24236 |
| Georgia | 21088 |
| North Carolina | 20390 |
| Louisiana | 19629 |
| Ohio | 19158 |
| Maryland | 17312 |
| Virginia | 15520 |
| Tennessee | 14930 |
| Missouri | 14832 |
| New Jersey | 14132 |
| Arizona | 12871 |
| South Carolina | 11698 |
| Indiana | 11463 |
| Alabama | 11376 |
| Oklahoma | 8809 |
| Washington | 7815 |
| District of Columbia | 7115 |
| Arkansas | 6947 |
| Colorado | 6593 |
| Kentucky | 6554 |
| Mississippi | 6546 |
| Wisconsin | 6191 |
| Massachusetts | 6036 |
| Nevada | 5553 |
| Connecticut | 4896 |
| New Mexico | 4272 |
| Oregon | 4217 |
| Minnesota | 3975 |
| Kansas | 3085 |
| West Virginia | 3061 |
| Utah | 2033 |
| Iowa | 1749 |
| Alaska | 1617 |
| Hawaii | 1338 |
| Nebraska | 1331 |
| Rhodes Island | 1211 |
| Delaware | 1179 |
| Idaho | 1150 |
| Maine | 869 |
| New Hampshire | 655 |
| Wyoming | 630 |
| Montana | 601 |
| South Dakota | 442 |
| Vermont | 412 |
| North Dakota | 308 |
In table and graph, they show us that the state with the most kills in which California is the state with the most murder cases with Texas coming in second and New York third.
Top 10 States with the highest amount of cases.
## # A tibble: 10 × 2
## State Murder_Count
## <chr> <int>
## 1 California 99783
## 2 Texas 62095
## 3 New York 49268
## 4 Florida 37164
## 5 Michigan 28448
## 6 Illinois 25871
## 7 Pennsylvania 24236
## 8 Georgia 21088
## 9 North Carolina 20390
## 10 Louisiana 19629
What is the most used weapon?, we count how many times killers used each kind of weapons to see the top of their weapon choice.
data %>% group_by(Weapon) %>%
summarize(Most_Weapon_Used = n()) %>%
arrange(desc(Most_Weapon_Used)) %>%
kbl() %>% kable_paper() %>% scroll_box(height = "800px")| Weapon | Most_Weapon_Used |
|---|---|
| Handgun | 317484 |
| Knife | 94962 |
| Blunt Object | 67337 |
| Firearm | 46980 |
| Unknown | 33192 |
| Shotgun | 30722 |
| Rifle | 23347 |
| Strangulation | 8110 |
| Fire | 6173 |
| Suffocation | 3968 |
| Gun | 2206 |
| Drugs | 1588 |
| Drowning | 1204 |
| Explosives | 537 |
| Poison | 454 |
| Fall | 190 |
Here you can see that the handgun is the most “favorite” weapon of serial killers compare to other weapons.
Murder Cases in USA
Here we wanted to visualize the highest crime counts in the US. Heatmaps are great when focusing on locations that matter the most. In this case, we see CA being red compare to other states. Also, in this heatmap, it shows how in Northern US there is less crime count.
Murder Cases in California and Texas
Now lets focus on the best state of the US, Texas. Unfortunately, Texas comes in second with the biggest crime rates. We wanted to see what county had the biggest crime rate. Harris county had the highest crime rate.
We wanted to include California since it has the highest among all
other states to see where most of the murders are.
California is broken down into cities instead of counties.
We wanted to include California since it has the highest among all
other states to see where most of the murders are.
California is broken down into cities instead of counties.
Genders of Victim by State, we count amount of cases based on data about genders by state.
Is there any missing or unknown data in this dataset?
#As an example, let's see how they show in gender of victims field!
data %>% group_by(Victim.Sex) %>% summarize(Gender = n())## # A tibble: 3 × 2
## Victim.Sex Gender
## <chr> <int>
## 1 Female 143345
## 2 Male 494125
## 3 Unknown 984
In this table as you can see, we have 984 that are unknown so we need to tidy up our data and get rid of the unknowns.
#How about Unknown Weapon?
data %>% group_by(Weapon) %>%
summarize(Most_Weapon_Used = n()) %>%
arrange(desc(Most_Weapon_Used)) %>%
kable() %>% kable_paper() %>% scroll_box(height = "800px")| Weapon | Most_Weapon_Used |
|---|---|
| Handgun | 317484 |
| Knife | 94962 |
| Blunt Object | 67337 |
| Firearm | 46980 |
| Unknown | 33192 |
| Shotgun | 30722 |
| Rifle | 23347 |
| Strangulation | 8110 |
| Fire | 6173 |
| Suffocation | 3968 |
| Gun | 2206 |
| Drugs | 1588 |
| Drowning | 1204 |
| Explosives | 537 |
| Poison | 454 |
| Fall | 190 |
Here you can see that the handgun is the most “favorite” weapon of serial killers compare to other weapons.
Let’s see how the distribution of cases by victims’ ages is based on data we have!
# Graph for cases by age
data %>% ggplot(aes(Victim.Age)) + geom_histogram(binwidth = 50) +
labs(title = "How many cases over victims' ages?",
x = "Age of Victim (years old)", y = "Cases")From then, we see that there are many cases with nearly 1000 year-old victims. It doesn’t make sense so then we proceeded to filter our data to make it more neat and coherent.
## Rows: 346,656
## Columns: 5
## $ Year <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 198…
## $ Victim.Age <int> 14, 43, 43, 30, 42, 99, 20, 36, 31, 16, 33, 27, 33, 31, 2…
## $ Victim.Sex <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Male…
## $ Relationship <chr> "Acquaintance", "Acquaintance", "Acquaintance", "Acquaint…
## $ Weapon <chr> "Blunt Object", "Strangulation", "Strangulation", "Rifle"…
In our filtered data we decided to work with data that we would find useful for our findings and remove all the ‘unknowns’ in the dataset. In our new filtered data we decided to work with Year, Victim’s Age, Victim’s Sex, Relationship to their perpetrator and type of Weapon used in each incident. We also filtered the victim’s age to be more accurate from combining them from age 1 to 100.
We wanted to observe the number of cases throughout the three decades.
| Year | n |
|---|---|
| 1980 | 14384 |
| 1981 | 14184 |
| 1982 | 13896 |
| 1983 | 13184 |
| 1984 | 12597 |
| 1985 | 12432 |
| 1986 | 12993 |
| 1987 | 12283 |
| 1988 | 11674 |
| 1989 | 12415 |
| 1990 | 12916 |
| 1991 | 12771 |
| 1992 | 12625 |
| 1993 | 12961 |
| 1994 | 12198 |
| 1995 | 11242 |
| 1996 | 9822 |
| 1997 | 9151 |
| 1998 | 8266 |
| 1999 | 7372 |
| 2000 | 7063 |
| 2001 | 7373 |
| 2002 | 7782 |
| 2003 | 7665 |
| 2004 | 7580 |
| 2005 | 7654 |
| 2006 | 7742 |
| 2007 | 7519 |
| 2008 | 6690 |
| 2009 | 7247 |
| 2010 | 6944 |
| 2011 | 6730 |
| 2012 | 6716 |
| 2013 | 6329 |
| 2014 | 6256 |
Now with our filtered data, we wanted to see the crimes rates throughout the years of 1980-2014. In 1980 & 1993, you can see that there is a peak in crimes rates but then they start to decreased. In 1980, the crime rate was high due to a severe global economic recession and inflation peaked in the US by 14.76%
# The rate of murders during the period of 1980-2014
filtered_data %>%
group_by(Year) %>%
summarise(murder = n()) %>%
ggplot(aes(Year,murder)) + geom_point() + geom_smooth()Then we wanted to see if the perpetrator had knew their victim before striking. So we made a variable based on their relationship.
## Rows: 346,656
## Columns: 6
## $ Year <chr> "1980", "1980", "1980", "1980", "1980", "1980…
## $ Victim.Age <chr> "14", "43", "43", "30", "42", "99", "20", "36…
## $ Victim.Sex <chr> "Male", "Male", "Male", "Male", "Female", "Fe…
## $ Relationship <chr> "Acquaintance", "Acquaintance", "Acquaintance…
## $ Weapon <chr> "Blunt Object", "Strangulation", "Strangulati…
## $ Relationship_with_murder <chr> "Known", "Known", "Known", "Known", "Known", …
Looking at this table, it there is a higher chance that the victim knows their perpetrator.
#Count how many cases they know each other
relationship_data %>% group_by(Relationship_with_murder) %>% summarise(cases = n())## # A tibble: 2 × 2
## Relationship_with_murder cases
## <chr> <int>
## 1 Known 253368
## 2 Unknown 93288
We wanted to count the Victim Sex and see the graph so the data is filtered here and in my findings, there is a higher percent for a male to be murdered than a female.
## Victim.Sex n
## 1 Female 89362
## 2 Male 257294
# Graph for Victim Sex
filtered_data %>% ggplot(aes(Victim.Sex, fill = Victim.Sex)) +
geom_bar(color = 'black') + theme_bw() +
geom_text(aes(label = ..count..), stat = "count", vjust = 5) +
labs(title = "Which gender is the most targeted?",
x = "Victim Gender", y = "Cases", fill = "Victim Gender")## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Then we wanted to see the correlation based on the victim’s gender and it’s perpetrator relationship.
# We wanted to see victim's gender and the correlation of their relationship?
relationship_data %>% ggplot(aes(Victim.Sex, fill = Relationship_with_murder)) +
geom_bar(color = 'black') +
theme_bw()+
geom_text(aes(label = ..count..), stat = "count", vjust = 1) +
labs(title = "How many cases do they know each other by genders? ",
x = "Victim Gender", y = "Cases", fill = "Relationship with murder")From above graph, it is obvious that most of victims know the murders before the incident
How does the distribution of cases look like by victim’s age?
filtered_data %>% ggplot(aes(Victim.Age)) +
geom_histogram(color = 'Black', fill = 'white', binwidth = 3) +
labs(x = "Victim Age", y = "Cases") We could see that the average age of a victim to be most likely murdered are the ages 21-25. However, it is not really clear to determine if the age is an effected factor on the rate of murder cases. So, let’s take a look at this flow.
Based on our findings some important takeaway from our analysis is that the perpetrator gets acquainted with the victims before committing murder . The victims of females are more likely to be known by them. Also, men are more likely to be murdered than women. The underlying factors and motives of a serial killer is that they all may have different motives, where it can be in desperate need of money, sex, power, etc but they are prepare to kill again and again. Lastly, the most used weapon to kill was a handgun. It makes us wonder if gun law’s were regulated in each state, would it reduce crime since it is easily accessible to acquire one.